Step 1.

Read the data and visualize and get familiar with the variables.

datapath<-"~/Documents/UChicago/Courses/Statistical Analysis/Final Project"
AssignmentData<-
  read.csv(file=paste(datapath,"regressionassignmentdata2014.csv",sep="/"),
           row.names=1,header=TRUE,sep=",")
head(AssignmentData)
##           USGG3M USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR  Output1
## 1/5/1981   13.52  13.09  12.289   12.28  12.294   12.152   11.672 18.01553
## 1/6/1981   13.58  13.16  12.429   12.31  12.214   12.112   11.672 18.09140
## 1/7/1981   14.50  13.90  12.929   12.78  12.614   12.382   11.892 19.44731
## 1/8/1981   14.76  14.00  13.099   12.95  12.684   12.352   11.912 19.74851
## 1/9/1981   15.20  14.30  13.539   13.28  12.884   12.572   12.132 20.57204
## 1/12/1981  15.22  14.23  13.179   12.94  12.714   12.452   12.082 20.14218
##           Easing Tightening
## 1/5/1981      NA         NA
## 1/6/1981      NA         NA
## 1/7/1981      NA         NA
## 1/8/1981      NA         NA
## 1/9/1981      NA         NA
## 1/12/1981     NA         NA

The first 7 variables (input variables) are the daily records of the US Treasury yields to maturity. Plot the input variables as below.

matplot(AssignmentData[,-c(8,9,10)],type='l')
nn <- ncol(AssignmentData)
legend("topright", colnames(AssignmentData),col=seq_len(nn),cex=0.8,fill=seq_len(nn))

Plot the input variables together with the output variable.The meaning of the variable Output will become clear later.

matplot(AssignmentData[,-c(9,10)],type='l')
nn <- ncol(AssignmentData)
legend("topright", colnames(AssignmentData),col=seq_len(nn),cex=0.8,fill=seq_len(nn))

Step 2.

Estimate simple regression model with each of the input variables and the output variable given in AssignmentData.

Input1.linear.Model = lm(AssignmentData$Output1~AssignmentData$USGG3M)
Input2.linear.Model = lm(AssignmentData$Output1~AssignmentData$USGG6M)
Input3.linear.Model = lm(AssignmentData$Output1~AssignmentData$USGG2YR)
Input4.linear.Model = lm(AssignmentData$Output1~AssignmentData$USGG3YR)
Input5.linear.Model = lm(AssignmentData$Output1~AssignmentData$USGG5YR)
Input6.linear.Model = lm(AssignmentData$Output1~AssignmentData$USGG10YR)
Input7.linear.Model = lm(AssignmentData$Output1~AssignmentData$USGG30YR)

Check available names of fields returned by lm() and summary() functions, if necessary. Analyze the summary.

Check relevance of the estimated parameters and the model as a whole, amount of correlation explained. Store coefficients for each input variable.

The following code gives an example of the analysis for the first input variable.

summary(Input1.linear.Model)
## 
## Call:
## lm(formula = AssignmentData$Output1 ~ AssignmentData$USGG3M)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9374 -1.2115 -0.0528  1.2640  7.7048 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -11.72318    0.03137  -373.7   <2e-16 ***
## AssignmentData$USGG3M   2.50756    0.00541   463.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.69 on 8298 degrees of freedom
## Multiple R-squared:  0.9628, Adjusted R-squared:  0.9628 
## F-statistic: 2.148e+05 on 1 and 8298 DF,  p-value: < 2.2e-16
c(Total.Variance=var(AssignmentData[,8]),Unexplained.Variance=summary(Input1.linear.Model)$sigma^2)
##       Total.Variance Unexplained.Variance 
##            76.804438             2.857058
Coefficients.Input1 = Input1.linear.Model$coefficients
Coefficients.Input1
##           (Intercept) AssignmentData$USGG3M 
##            -11.723184              2.507561
matplot(AssignmentData[,8],type="l",xaxt="n",col = "red")
lines(Input1.linear.Model$fitted.values,col="black")

Second input, USGG6M.

summary(Input2.linear.Model)
## 
## Call:
## lm(formula = AssignmentData$Output1 ~ AssignmentData$USGG6M)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7529 -1.0385  0.0224  1.1443  4.1509 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -12.097528   0.026469  -457.0   <2e-16 ***
## AssignmentData$USGG6M   2.497235   0.004445   561.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.403 on 8298 degrees of freedom
## Multiple R-squared:  0.9744, Adjusted R-squared:  0.9744 
## F-statistic: 3.157e+05 on 1 and 8298 DF,  p-value: < 2.2e-16
c(Total.Variance=var(AssignmentData[,8]),Unexplained.Variance=summary(Input2.linear.Model)$sigma^2)
##       Total.Variance Unexplained.Variance 
##            76.804438             1.967321
Coefficients.Input2 = Input2.linear.Model$coefficients
Coefficients.Input2
##           (Intercept) AssignmentData$USGG6M 
##            -12.097528              2.497235
matplot(AssignmentData[,8],type="l",xaxt="n",col = "red")
lines(Input2.linear.Model$fitted.values,col="grey")

Third input, USGG2YR.

summary(Input3.linear.Model)
## 
## Call:
## lm(formula = AssignmentData$Output1 ~ AssignmentData$USGG2YR)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.43277 -0.38356 -0.00578  0.43362  1.72564 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -13.055775   0.010031   -1302   <2e-16 ***
## AssignmentData$USGG2YR   2.400449   0.001532    1567   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5087 on 8298 degrees of freedom
## Multiple R-squared:  0.9966, Adjusted R-squared:  0.9966 
## F-statistic: 2.455e+06 on 1 and 8298 DF,  p-value: < 2.2e-16
c(Total.Variance=var(AssignmentData[,8]),Unexplained.Variance=summary(Input3.linear.Model)$sigma^2)
##       Total.Variance Unexplained.Variance 
##           76.8044379            0.2588092
Coefficients.Input3 = Input3.linear.Model$coefficients
Coefficients.Input3
##            (Intercept) AssignmentData$USGG2YR 
##             -13.055775               2.400449
matplot(AssignmentData[,8],type="l",xaxt="n",col = "red")
lines(Input3.linear.Model$fitted.values,col="green")

Fourth input, USGG3YR.

summary(Input4.linear.Model)
## 
## Call:
## lm(formula = AssignmentData$Output1 ~ AssignmentData$USGG3YR)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0160 -0.2459  0.0325  0.2638  3.0666 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -13.861618   0.008214   -1688   <2e-16 ***
## AssignmentData$USGG3YR   2.455793   0.001230    1996   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3996 on 8298 degrees of freedom
## Multiple R-squared:  0.9979, Adjusted R-squared:  0.9979 
## F-statistic: 3.984e+06 on 1 and 8298 DF,  p-value: < 2.2e-16
c(Total.Variance=var(AssignmentData[,8]),Unexplained.Variance=summary(Input4.linear.Model)$sigma^2)
##       Total.Variance Unexplained.Variance 
##            76.804438             0.159657
Coefficients.Input4 = Input4.linear.Model$coefficients
Coefficients.Input4
##            (Intercept) AssignmentData$USGG3YR 
##             -13.861618               2.455793
matplot(AssignmentData[,8],type="l",xaxt="n",col = "red")
lines(Input4.linear.Model$fitted.values,col="blue")

Fifth input, USGG5YR.

summary(Input5.linear.Model)
## 
## Call:
## lm(formula = AssignmentData$Output1 ~ AssignmentData$USGG5YR)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6517 -0.6216 -0.0147  0.6523  3.1720 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -15.436649   0.017819  -866.3   <2e-16 ***
## AssignmentData$USGG5YR   2.568742   0.002581   995.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7989 on 8298 degrees of freedom
## Multiple R-squared:  0.9917, Adjusted R-squared:  0.9917 
## F-statistic: 9.904e+05 on 1 and 8298 DF,  p-value: < 2.2e-16
c(Total.Variance=var(AssignmentData[,8]),Unexplained.Variance=summary(Input5.linear.Model)$sigma^2)
##       Total.Variance Unexplained.Variance 
##           76.8044379            0.6382572
Coefficients.Input5 = Input5.linear.Model$coefficients
Coefficients.Input5
##            (Intercept) AssignmentData$USGG5YR 
##             -15.436649               2.568742
matplot(AssignmentData[,8],type="l",xaxt="n",col = "red")
lines(Input5.linear.Model$fitted.values,col="cyan")

Sixth input, USGG10YR.

summary(Input6.linear.Model)
## 
## Call:
## lm(formula = AssignmentData$Output1 ~ AssignmentData$USGG10YR)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0334 -1.2810 -0.1944  1.3561  4.2254 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -18.063370   0.039182  -461.0   <2e-16 ***
## AssignmentData$USGG10YR   2.786991   0.005455   510.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.538 on 8298 degrees of freedom
## Multiple R-squared:  0.9692, Adjusted R-squared:  0.9692 
## F-statistic: 2.61e+05 on 1 and 8298 DF,  p-value: < 2.2e-16
c(Total.Variance=var(AssignmentData[,8]),Unexplained.Variance=summary(Input6.linear.Model)$sigma^2)
##       Total.Variance Unexplained.Variance 
##            76.804438             2.366752
Coefficients.Input6 = Input6.linear.Model$coefficients
Coefficients.Input6
##             (Intercept) AssignmentData$USGG10YR 
##              -18.063370                2.786991
matplot(AssignmentData[,8],type="l",xaxt="n",col = "red")
lines(Input6.linear.Model$fitted.values,col="Magenta")

Seventh input, USGG30YR.

summary(Input7.linear.Model)
## 
## Call:
## lm(formula = AssignmentData$Output1 ~ AssignmentData$USGG30YR)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6447 -1.8426 -0.4731  1.9529  4.8734 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -21.08591    0.06560  -321.4   <2e-16 ***
## AssignmentData$USGG30YR   3.06956    0.00886   346.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.229 on 8298 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9353 
## F-statistic: 1.2e+05 on 1 and 8298 DF,  p-value: < 2.2e-16
c(Total.Variance=var(AssignmentData[,8]),Unexplained.Variance=summary(Input7.linear.Model)$sigma^2)
##       Total.Variance Unexplained.Variance 
##            76.804438             4.967286
Coefficients.Input7 = Input7.linear.Model$coefficients
Coefficients.Input7
##             (Intercept) AssignmentData$USGG30YR 
##              -21.085905                3.069561
matplot(AssignmentData[,8],type="l",xaxt="n",col = "red")
lines(Input7.linear.Model$fitted.values,col="yellow")

Collect all slopes and intercepts in one table and print this table. Try to do it in one line using apply() function.

lms.coefficients<- apply(AssignmentData[,1:7],2,function(z) lm(AssignmentData$Output1 ~ z)$coefficients)
lms.coefficients
##                 USGG3M     USGG6M    USGG2YR    USGG3YR    USGG5YR
## (Intercept) -11.723184 -12.097528 -13.055775 -13.861618 -15.436649
## z             2.507561   2.497235   2.400449   2.455793   2.568742
##               USGG10YR   USGG30YR
## (Intercept) -18.063370 -21.085905
## z             2.786991   3.069561
#alternative
#lms.coefficients<- sapply(1:7, function(z) lm(AssignmentData$Output1 ~ AssignmentData[,z])$coefficients)

Step 3.

Fit linear regression models using single output (column 8 Output1) as input and each of the original inputs as outputs.

RInput1.linear.Model = lm(AssignmentData$USGG3M ~ AssignmentData$Output1)
RInput2.linear.Model = lm(AssignmentData$USGG6M ~ AssignmentData$Output1)
RInput3.linear.Model = lm(AssignmentData$USGG2YR ~ AssignmentData$Output1)
RInput4.linear.Model = lm(AssignmentData$USGG3YR ~ AssignmentData$Output1)
RInput5.linear.Model = lm(AssignmentData$USGG5YR ~ AssignmentData$Output1)
RInput6.linear.Model = lm(AssignmentData$USGG10YR ~ AssignmentData$Output1)
RInput7.linear.Model = lm(AssignmentData$USGG30YR ~ AssignmentData$Output1)

Collect all slopes and intercepts in one table and print this table.

reverse.lms.coefficients<- sapply(1:7, function(z) lm(AssignmentData[,z] ~ AssignmentData$Output1)$coefficients)
reverse.lms.coefficients
##                             [,1]     [,2]      [,3]      [,4]     [,5]
## (Intercept)            4.6751341 4.844370 5.4388879 5.6444580 6.009421
## AssignmentData$Output1 0.3839609 0.390187 0.4151851 0.4063541 0.386061
##                             [,6]      [,7]
## (Intercept)            6.4813160 6.8693554
## AssignmentData$Output1 0.3477544 0.3047124

Step 4

Estimate logistic regression using all inputs and the data on FED tightening and easing cycles.

AssignmentDataLogistic<-data.matrix(AssignmentData,rownames.force="automatic")

Prepare the easing-tightening data. Make the easing column equal to 0 during the easing periods and NA otherwise. Make the tightening column equal to 1 during the tightening periods and NA otherwise.

# Create columns of easing periods (as 0s) and tightening periods (as 1s)
EasingPeriods<-AssignmentDataLogistic[,9]
EasingPeriods[AssignmentDataLogistic[,9]==1]<-0
TighteningPeriods<-AssignmentDataLogistic[,10]
# Check easing and tightening periods
cbind(EasingPeriods,TighteningPeriods)[c(550:560,900:910,970:980),]
##            EasingPeriods TighteningPeriods
## 3/29/1983             NA                NA
## 3/30/1983             NA                NA
## 3/31/1983             NA                NA
## 4/4/1983              NA                 1
## 4/5/1983              NA                 1
## 4/6/1983              NA                 1
## 4/7/1983              NA                 1
## 4/8/1983              NA                 1
## 4/11/1983             NA                 1
## 4/12/1983             NA                 1
## 4/13/1983             NA                 1
## 8/27/1984             NA                 1
## 8/28/1984             NA                 1
## 8/29/1984             NA                 1
## 8/30/1984             NA                 1
## 8/31/1984             NA                 1
## 9/4/1984              NA                NA
## 9/5/1984              NA                NA
## 9/6/1984              NA                NA
## 9/7/1984              NA                NA
## 9/10/1984             NA                NA
## 9/11/1984             NA                NA
## 12/10/1984             0                NA
## 12/11/1984             0                NA
## 12/12/1984             0                NA
## 12/13/1984             0                NA
## 12/14/1984             0                NA
## 12/17/1984             0                NA
## 12/18/1984             0                NA
## 12/19/1984             0                NA
## 12/20/1984             0                NA
## 12/21/1984             0                NA
## 12/24/1984             0                NA

Remove the periods of neither easing nor tightening.

All.NAs<-is.na(EasingPeriods)&is.na(TighteningPeriods)
AssignmentDataLogistic.EasingTighteningOnly<-AssignmentDataLogistic
AssignmentDataLogistic.EasingTighteningOnly[,9]<-EasingPeriods
AssignmentDataLogistic.EasingTighteningOnly<-AssignmentDataLogistic.EasingTighteningOnly[!All.NAs,]
AssignmentDataLogistic.EasingTighteningOnly[is.na(AssignmentDataLogistic.EasingTighteningOnly[,10]),10]<-0 #without this, there still are lots of tightening data = NA, they need to be changed to 0
matplot(AssignmentDataLogistic.EasingTighteningOnly[,-c(9,10)],type="l",ylab="Data and Binary Fed Mode")
lines(AssignmentDataLogistic.EasingTighteningOnly[,10]*20,col="red") #times 20 for viewing

#only tightening

Estimate logistic regression with 3M yields as predictors for easing/tightening output.

LogisticModel.TighteningEasing_3M<-glm(AssignmentDataLogistic.EasingTighteningOnly[,10]~                                      AssignmentDataLogistic.EasingTighteningOnly[,1],family=binomial(link=logit))
summary(LogisticModel.TighteningEasing_3M)
## 
## Call:
## glm(formula = AssignmentDataLogistic.EasingTighteningOnly[, 10] ~ 
##     AssignmentDataLogistic.EasingTighteningOnly[, 1], family = binomial(link = logit))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4239  -0.9014  -0.7737   1.3548   1.6743  
## 
## Coefficients:
##                                                  Estimate Std. Error
## (Intercept)                                      -2.15256    0.17328
## AssignmentDataLogistic.EasingTighteningOnly[, 1]  0.18638    0.02144
##                                                  z value Pr(>|z|)    
## (Intercept)                                      -12.422   <2e-16 ***
## AssignmentDataLogistic.EasingTighteningOnly[, 1]   8.694   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2983.5  on 2357  degrees of freedom
## Residual deviance: 2904.8  on 2356  degrees of freedom
## AIC: 2908.8
## 
## Number of Fisher Scoring iterations: 4
matplot(AssignmentDataLogistic.EasingTighteningOnly[,-c(9,10)],type="l",ylab="Data and Fitted Values")
lines(AssignmentDataLogistic.EasingTighteningOnly[,10]*20,col="red")
lines(LogisticModel.TighteningEasing_3M$fitted.values*20,col="green")

Now use all inputs as predictors for logistic regression.

LogisticModel.TighteningEasing_All<-glm(AssignmentDataLogistic.EasingTighteningOnly[,10]~                                      AssignmentDataLogistic.EasingTighteningOnly[,-c(8,9,10)],family=binomial(link=logit))
summary(LogisticModel.TighteningEasing_All)
## 
## Call:
## glm(formula = AssignmentDataLogistic.EasingTighteningOnly[, 10] ~ 
##     AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)], 
##     family = binomial(link = logit))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2113  -0.8595  -0.5935   1.1306   2.5530  
## 
## Coefficients:
##                                                                     Estimate
## (Intercept)                                                          -4.7552
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3M    -3.3456
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG6M     4.1559
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG2YR    3.9460
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3YR   -3.4642
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG5YR   -3.2115
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG10YR  -0.9705
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG30YR   3.3254
##                                                                     Std. Error
## (Intercept)                                                             0.4312
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3M       0.2666
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG6M       0.3748
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG2YR      0.7554
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3YR      0.9340
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG5YR      0.7795
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG10YR     0.9764
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG30YR     0.6138
##                                                                     z value
## (Intercept)                                                         -11.029
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3M   -12.548
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG6M    11.089
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG2YR    5.224
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3YR   -3.709
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG5YR   -4.120
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG10YR  -0.994
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG30YR   5.418
##                                                                     Pr(>|z|)
## (Intercept)                                                          < 2e-16
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3M    < 2e-16
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG6M    < 2e-16
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG2YR  1.75e-07
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3YR  0.000208
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG5YR  3.79e-05
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG10YR 0.320214
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG30YR 6.04e-08
##                                                                        
## (Intercept)                                                         ***
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3M   ***
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG6M   ***
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG2YR  ***
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3YR  ***
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG5YR  ***
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG10YR    
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG30YR ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2983.5  on 2357  degrees of freedom
## Residual deviance: 2629.6  on 2350  degrees of freedom
## AIC: 2645.6
## 
## Number of Fisher Scoring iterations: 4

Explore the estimated model.

summary(LogisticModel.TighteningEasing_All)$aic
## [1] 2645.579
summary(LogisticModel.TighteningEasing_All)$coefficients[,c(1,4)]
##                                                                       Estimate
## (Intercept)                                                         -4.7551928
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3M   -3.3456116
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG6M    4.1558535
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG2YR   3.9460296
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3YR  -3.4642455
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG5YR  -3.2115319
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG10YR -0.9705444
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG30YR  3.3253517
##                                                                         Pr(>|z|)
## (Intercept)                                                         2.784283e-28
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3M   4.073045e-36
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG6M   1.422964e-28
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG2YR  1.751687e-07
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG3YR  2.080617e-04
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG5YR  3.786229e-05
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG10YR 3.202140e-01
## AssignmentDataLogistic.EasingTighteningOnly[, -c(8, 9, 10)]USGG30YR 6.036041e-08
matplot(AssignmentDataLogistic.EasingTighteningOnly[,-c(9,10)],type="l",ylab="Results of Logistic Regression")
lines(AssignmentDataLogistic.EasingTighteningOnly[,10]*20,col="red")
lines(LogisticModel.TighteningEasing_All$fitted.values*20,col="green")

Interpret the coefficients of the model and the fitted values.

Calculate and plot log-odds and probabilities. Compare probabilities with fitted values.

# Calculate odds
Log.Odds<-predict(LogisticModel.TighteningEasing_All)
plot(Log.Odds,type="l")

Probabilities<-1/(exp(-Log.Odds)+1)  #from ln(P/(1-P)) to P
plot(LogisticModel.TighteningEasing_All$fitted.values,type="l",ylab="Fitted Values & Log-Odds")
lines(Probabilities,col="red")

#lines(Probabilities_test,col = "green")

Step 5.

Compare linear regression models with different combinations of predictors. Select the best combination.

Below we show only two of possible combinations: full model containing all 7 predictors and Null model containing only intercept, but none of the 7 predictors. Estimate other possible combinations.

AssignmentDataRegressionComparison<-data.matrix(AssignmentData[,-c(9,10)],rownames.force="automatic")
AssignmentDataRegressionComparison<-AssignmentData[,-c(9,10)]

Estimate the full model by using all 7 predictors.

RegressionModelComparison.Full <- lm(Output1 ~ ., data = AssignmentDataRegressionComparison)
names(summary(RegressionModelComparison.Full))
##  [1] "call"          "terms"         "residuals"     "coefficients" 
##  [5] "aliased"       "sigma"         "df"            "r.squared"    
##  [9] "adj.r.squared" "fstatistic"    "cov.unscaled"
#summary(RegressionModelComparison.Full)

Look at coefficients, \(R^2\), adjusted \(R^2\), degrees of freedom.

  1. Coefficients:
summary(RegressionModelComparison.Full)$coefficients
##                Estimate   Std. Error       t value Pr(>|t|)
## (Intercept) -14.9041591 1.056850e-10 -141024294891        0
## USGG3M        0.3839609 9.860401e-11    3893968285        0
## USGG6M        0.3901870 1.500111e-10    2601053702        0
## USGG2YR       0.4151851 2.569451e-10    1615851177        0
## USGG3YR       0.4063541 3.299038e-10    1231735395        0
## USGG5YR       0.3860610 2.618339e-10    1474449865        0
## USGG10YR      0.3477544 2.800269e-10    1241860763        0
## USGG30YR      0.3047124 1.566487e-10    1945195584        0
  1. \(R^2\), adjusted \(R^2\):
summary(RegressionModelComparison.Full)$r.squared
## [1] 1
summary(RegressionModelComparison.Full)$adj.r.squared
## [1] 1
  1. degrees of freedom
summary(RegressionModelComparison.Full)$df
## [1]    8 8292    8

Intepret the fitted model. How good is the fit? How significant are the parameters?

Too good, we might have overfitted, all the parameters are extremely significant

Estimate the Null model by including only intercept.

RegressionModelComparison.Null <- lm(Output1 ~ 1, data = AssignmentDataRegressionComparison)
summary(RegressionModelComparison.Null)
## 
## Call:
## lm(formula = Output1 ~ 1, data = AssignmentDataRegressionComparison)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.173  -6.509  -0.415   4.860  27.298 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.42e-11   9.62e-02       0        1
## 
## Residual standard error: 8.764 on 8299 degrees of freedom
  1. Coefficients:
summary(RegressionModelComparison.Null)$coefficients
##                 Estimate Std. Error      t value Pr(>|t|)
## (Intercept) 1.420082e-11 0.09619536 1.476248e-10        1
  1. \(R^2\), adjusted \(R^2\):
summary(RegressionModelComparison.Null)$r.squared
## [1] 0
summary(RegressionModelComparison.Null)$adj.r.squared
## [1] 0
  1. degrees of freedom
summary(RegressionModelComparison.Null)$df
## [1]    1 8299    1

Why summary(RegressionModelComparison.Null) does not show \(R^2\)?

Because there is only beta0, no other betas. Thus, as shown below, SSM are almost zeros, SST = SSE, and \(R^2\) = SSM/SST = 0, the model is useless.

(SSM = sum((RegressionModelComparison.Null$fitted.values - mean(RegressionModelComparison.Null$fitted.values))^2))
## [1] 6.052545e-24
(SSE = sum((AssignmentDataRegressionComparison$Output1 - RegressionModelComparison.Null$fitted.values)^2))
## [1] 637400
(SST = SSE+SSM)
## [1] 637400

Compare models pairwise using anova()

anova(RegressionModelComparison.Full,RegressionModelComparison.Null)
## Analysis of Variance Table
## 
## Model 1: Output1 ~ USGG3M + USGG6M + USGG2YR + USGG3YR + USGG5YR + USGG10YR + 
##     USGG30YR
## Model 2: Output1 ~ 1
##   Res.Df    RSS Df Sum of Sq        F    Pr(>F)    
## 1   8292      0                                    
## 2   8299 637400 -7   -637400 3.73e+22 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpret the results of anova().

Not all betas are zeros for sure, however, the full model explains all the variations, which seems overfitting and might not be the best thing.

Repeat the analysis for different combinations of input variables and select the one you think is the best.Explain your selection.

drop1(RegressionModelComparison.Full)
## Warning: attempting model selection on an essentially perfect fit is
## nonsense
## Single term deletions
## 
## Model:
## Output1 ~ USGG3M + USGG6M + USGG2YR + USGG3YR + USGG5YR + USGG10YR + 
##     USGG30YR
##          Df Sum of Sq    RSS     AIC
## <none>                 0.000 -336591
## USGG3M    1    37.016 37.016  -44911
## USGG6M    1    16.516 16.516  -51609
## USGG2YR   1     6.374  6.374  -59512
## USGG3YR   1     3.704  3.704  -64018
## USGG5YR   1     5.307  5.307  -61032
## USGG10YR  1     3.765  3.765  -63882
## USGG30YR  1     9.237  9.237  -56433
pairs(AssignmentDataRegressionComparison)

drop1 and pairs do not seem like good model selection methods in this case.

(myScope<-names(AssignmentDataRegressionComparison)[-8])
## [1] "USGG3M"   "USGG6M"   "USGG2YR"  "USGG3YR"  "USGG5YR"  "USGG10YR"
## [7] "USGG30YR"
add1(RegressionModelComparison.Null,scope=myScope)
## Single term additions
## 
## Model:
## Output1 ~ 1
##          Df Sum of Sq    RSS    AIC
## <none>                637400  36033
## USGG3M    1    613692  23708   8715
## USGG6M    1    621075  16325   5618
## USGG2YR   1    635252   2148 -11217
## USGG3YR   1    636075   1325 -15226
## USGG5YR   1    632104   5296  -3725
## USGG10YR  1    617761  19639   7153
## USGG30YR  1    596181  41219  13306

USGG3YR should be added

RegressionModelComparison.1 = lm(Output1 ~ USGG3YR, data = AssignmentDataRegressionComparison)
summary(RegressionModelComparison.1)
## 
## Call:
## lm(formula = Output1 ~ USGG3YR, data = AssignmentDataRegressionComparison)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0160 -0.2459  0.0325  0.2638  3.0666 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -13.861618   0.008214   -1688   <2e-16 ***
## USGG3YR       2.455793   0.001230    1996   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3996 on 8298 degrees of freedom
## Multiple R-squared:  0.9979, Adjusted R-squared:  0.9979 
## F-statistic: 3.984e+06 on 1 and 8298 DF,  p-value: < 2.2e-16

note that the adj. R is already extremely high

(myScope<-names(AssignmentDataRegressionComparison)[-c(4,8)])
## [1] "USGG3M"   "USGG6M"   "USGG2YR"  "USGG5YR"  "USGG10YR" "USGG30YR"
add1(RegressionModelComparison.1,scope=myScope)
## Single term additions
## 
## Model:
## Output1 ~ USGG3YR
##          Df Sum of Sq     RSS    AIC
## <none>                1324.83 -15226
## USGG3M    1    407.98  916.85 -18279
## USGG6M    1    270.17 1054.66 -17117
## USGG2YR   1     82.93 1241.91 -15761
## USGG5YR   1     20.94 1303.89 -15356
## USGG10YR  1     64.05 1260.78 -15636
## USGG30YR  1    100.79 1224.04 -15881
RegressionModelComparison.2 = lm(Output1 ~ USGG3YR + USGG3M, data = AssignmentDataRegressionComparison)
summary(RegressionModelComparison.2)
## 
## Call:
## lm(formula = Output1 ~ USGG3YR + USGG3M, data = AssignmentDataRegressionComparison)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.19592 -0.25503  0.01678  0.24577  1.77420 
## 
## Coefficients:
##               Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) -13.671658   0.007515 -1819.36   <2e-16 ***
## USGG3YR       2.171934   0.004782   454.14   <2e-16 ***
## USGG3M        0.302081   0.004972    60.76   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3324 on 8297 degrees of freedom
## Multiple R-squared:  0.9986, Adjusted R-squared:  0.9986 
## F-statistic: 2.88e+06 on 2 and 8297 DF,  p-value: < 2.2e-16
anova(RegressionModelComparison.1,RegressionModelComparison.2)
## Analysis of Variance Table
## 
## Model 1: Output1 ~ USGG3YR
## Model 2: Output1 ~ USGG3YR + USGG3M
##   Res.Df     RSS Df Sum of Sq    F    Pr(>F)    
## 1   8298 1324.83                                
## 2   8297  916.85  1    407.98 3692 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova result show that they are quite different. So we can keep both.

(myScope<-names(AssignmentDataRegressionComparison)[-c(1,4,8)])
## [1] "USGG6M"   "USGG2YR"  "USGG5YR"  "USGG10YR" "USGG30YR"
add1(RegressionModelComparison.2,scope=myScope)
## Single term additions
## 
## Model:
## Output1 ~ USGG3YR + USGG3M
##          Df Sum of Sq    RSS    AIC
## <none>                916.85 -18279
## USGG6M    1    139.91 776.95 -19652
## USGG2YR   1    176.99 739.86 -20058
## USGG5YR   1    736.49 180.36 -31773
## USGG10YR  1    864.29  52.56 -42007
## USGG30YR  1    863.62  53.23 -41902

We should add USGG10YR next.

RegressionModelComparison.3 = lm(Output1 ~ USGG3YR + USGG3M + USGG10YR, data = AssignmentDataRegressionComparison)
summary(RegressionModelComparison.3)
## 
## Call:
## lm(formula = Output1 ~ USGG3YR + USGG3M + USGG10YR, data = AssignmentDataRegressionComparison)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42997 -0.04207  0.00512  0.04825  0.69394 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -14.723583   0.003369 -4370.4   <2e-16 ***
## USGG3YR       1.120774   0.003068   365.3   <2e-16 ***
## USGG3M        0.705571   0.001616   436.7   <2e-16 ***
## USGG10YR      0.786689   0.002130   369.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0796 on 8296 degrees of freedom
## Multiple R-squared:  0.9999, Adjusted R-squared:  0.9999 
## F-statistic: 3.353e+07 on 3 and 8296 DF,  p-value: < 2.2e-16
anova(RegressionModelComparison.2,RegressionModelComparison.3)
## Analysis of Variance Table
## 
## Model 1: Output1 ~ USGG3YR + USGG3M
## Model 2: Output1 ~ USGG3YR + USGG3M + USGG10YR
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   8297 916.85                                  
## 2   8296  52.56  1    864.29 136412 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova show that it’s significant again. and adj R is close to 1. For now, we picked one short term, one median and one long term as predictors.

Relative importance measures

suppressMessages(library(relaimpo))
metrics.Full <- calc.relimp(RegressionModelComparison.Full, type = c("lmg", "first", "last","betasq", "pratt"))
## Warning in rev(variances[[p]]) - variances[[p + 1]]: Recycling array of length 1 in vector-array arithmetic is deprecated.
##   Use c() or as.vector() instead.
metrics.Full
## Response variable: Output1 
## Total response variance: 76.80444 
## Analysis based on 8300 observations 
## 
## 7 Regressors: 
## USGG3M USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR 
## Proportion of variance explained by model: 100%
## Metrics are not normalized (rela=FALSE). 
## 
## Relative importance metrics: 
## 
##                lmg         last     first      betasq      pratt
## USGG3M   0.1404709 5.807356e-05 0.9628054 0.022574051 0.14742597
## USGG6M   0.1425004 2.591148e-05 0.9743884 0.023788052 0.15224586
## USGG2YR  0.1466582 9.999916e-06 0.9966307 0.029814849 0.17237863
## USGG3YR  0.1468640 5.810700e-06 0.9979215 0.027322621 0.16512368
## USGG5YR  0.1457366 8.326330e-06 0.9916908 0.022399959 0.14904306
## USGG10YR 0.1417725 5.906625e-06 0.9691884 0.015089760 0.12093312
## USGG30YR 0.1359975 1.449173e-05 0.9353333 0.009217098 0.09284966
## 
## Average coefficients for different model sizes: 
## 
##                1X        2Xs       3Xs       4Xs       5Xs       6Xs
## USGG3M   2.507561  0.2263472 0.6109835 0.5227870 0.4628923 0.4211955
## USGG6M   2.497235  1.4470531 0.7276930 0.6852762 0.5880926 0.4868271
## USGG2YR  2.400449  1.8713892 1.2496809 0.9362129 0.6876128 0.4720868
## USGG3YR  2.455793  2.1971757 1.2775152 0.6434783 0.4689353 0.4963413
## USGG5YR  2.568742  1.9951251 1.0962316 0.7209641 0.5450640 0.4051945
## USGG10YR 2.786991  1.4935881 0.5874971 0.6759481 0.5704141 0.4630193
## USGG30YR 3.069561 -0.4030409 0.4839594 0.3866284 0.3539104 0.3276156
##                7Xs
## USGG3M   0.3839609
## USGG6M   0.3901870
## USGG2YR  0.4151851
## USGG3YR  0.4063541
## USGG5YR  0.3860610
## USGG10YR 0.3477544
## USGG30YR 0.3047124
(metrics.Full.rank<-metrics.Full@lmg.rank)
##   USGG3M   USGG6M  USGG2YR  USGG3YR  USGG5YR USGG10YR USGG30YR 
##        6        4        2        1        3        5        7

If we were to use three predictors, relative importance measures gave us a model with USGG3YR, SGG2YR, USGG5YR.

Selection of predictors based on regsubsets()

library(leaps)
subsetsFull<-regsubsets(x=AssignmentDataRegressionComparison[,1:7],y=AssignmentDataRegressionComparison[,8])
summary(subsetsFull)$which
##   (Intercept) USGG3M USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR
## 1        TRUE  FALSE  FALSE   FALSE    TRUE   FALSE    FALSE    FALSE
## 2        TRUE   TRUE  FALSE   FALSE   FALSE    TRUE    FALSE    FALSE
## 3        TRUE   TRUE  FALSE    TRUE   FALSE   FALSE     TRUE    FALSE
## 4        TRUE   TRUE   TRUE   FALSE    TRUE   FALSE     TRUE    FALSE
## 5        TRUE   TRUE   TRUE    TRUE   FALSE    TRUE    FALSE     TRUE
## 6        TRUE   TRUE   TRUE    TRUE   FALSE    TRUE     TRUE     TRUE
## 7        TRUE   TRUE   TRUE    TRUE    TRUE    TRUE     TRUE     TRUE

As can be seen, the three predictor model given by regsubset is a model by USGG3M, USGG2YR, USGG10YR, this is in fact very close to what we chose with Add1.

Final selection:I decided to choose USGG3M + USGG3YR + USGG10YR as our predictors based on Add1() and regsebset().

Step 6.

Perform rolling window analysis of the yields data. Use package zoo for rolling window analysis.

Set the window width and window shift parameters for rolling window.

Window.width<-20; Window.shift<-5

Run rolling mean values usingrollapply().

library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

Calculate rolling mean values for each variable.

# Means
all.means<-rollapply(AssignmentDataRegressionComparison,width=Window.width,by=Window.shift,by.column=TRUE, mean)
head(all.means,10)
##        USGG3M  USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR  Output1
##  [1,] 15.0405 14.0855 13.2795 12.9360 12.7825  12.5780  12.1515 20.14842
##  [2,] 15.1865 14.1440 13.4855 13.1085 12.9310  12.7370  12.3370 20.55208
##  [3,] 15.2480 14.2755 13.7395 13.3390 13.1500  12.9480  12.5500 21.04895
##  [4,] 14.9345 14.0780 13.7750 13.4765 13.2385  13.0515  12.6610 21.02611
##  [5,] 14.7545 14.0585 13.9625 13.6890 13.4600  13.2295  12.8335 21.31356
##  [6,] 14.6025 14.0115 14.0380 13.7790 13.5705  13.3050  12.8890 21.39061
##  [7,] 14.0820 13.5195 13.8685 13.6710 13.4815  13.1880  12.7660 20.77200
##  [8,] 13.6255 13.0635 13.6790 13.5735 13.4270  13.1260  12.6950 20.23626
##  [9,] 13.2490 12.6810 13.5080 13.4575 13.3680  13.0770  12.6470 19.76988
## [10,] 12.9545 12.4225 13.4140 13.4240 13.3850  13.1115  12.6755 19.53054
# first mean is row 1-row20, the next mean is row 6- row25, so there there should be ~8300/5 =1660 means, that's why all means has 1657 rows. 
# Create points at which rolling means are calculated
Count<-1:length(AssignmentDataRegressionComparison[,1]) #Count is a vector of 1:8300
Rolling.window.matrix<-rollapply(Count,width=Window.width,by=Window.shift,by.column=FALSE,
          FUN=function(z) z)
Rolling.window.matrix[1:10,] # how is firt 10 means calculated. 
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]    1    2    3    4    5    6    7    8    9    10    11    12    13
##  [2,]    6    7    8    9   10   11   12   13   14    15    16    17    18
##  [3,]   11   12   13   14   15   16   17   18   19    20    21    22    23
##  [4,]   16   17   18   19   20   21   22   23   24    25    26    27    28
##  [5,]   21   22   23   24   25   26   27   28   29    30    31    32    33
##  [6,]   26   27   28   29   30   31   32   33   34    35    36    37    38
##  [7,]   31   32   33   34   35   36   37   38   39    40    41    42    43
##  [8,]   36   37   38   39   40   41   42   43   44    45    46    47    48
##  [9,]   41   42   43   44   45   46   47   48   49    50    51    52    53
## [10,]   46   47   48   49   50   51   52   53   54    55    56    57    58
##       [,14] [,15] [,16] [,17] [,18] [,19] [,20]
##  [1,]    14    15    16    17    18    19    20
##  [2,]    19    20    21    22    23    24    25
##  [3,]    24    25    26    27    28    29    30
##  [4,]    29    30    31    32    33    34    35
##  [5,]    34    35    36    37    38    39    40
##  [6,]    39    40    41    42    43    44    45
##  [7,]    44    45    46    47    48    49    50
##  [8,]    49    50    51    52    53    54    55
##  [9,]    54    55    56    57    58    59    60
## [10,]    59    60    61    62    63    64    65
# Take middle of each window
Points.of.calculation<-Rolling.window.matrix[,10]

Points.of.calculation[1:10]
##  [1] 10 15 20 25 30 35 40 45 50 55
length(Points.of.calculation)
## [1] 1657
# Incert means into the total length vector to plot the rolling mean with the original data
Means.forPlot<-rep(NA,length(AssignmentDataRegressionComparison[,1])) #define a 8300 length NA
Means.forPlot[Points.of.calculation]<-all.means[,1] #Means.forPlot's 10th, 15th 20th...1655 th positiion is inserted the of rolling window means from 3M, other places remains NA
Means.forPlot[1:50]
##  [1]      NA      NA      NA      NA      NA      NA      NA      NA
##  [9]      NA 15.0405      NA      NA      NA      NA 15.1865      NA
## [17]      NA      NA      NA 15.2480      NA      NA      NA      NA
## [25] 14.9345      NA      NA      NA      NA 14.7545      NA      NA
## [33]      NA      NA 14.6025      NA      NA      NA      NA 14.0820
## [41]      NA      NA      NA      NA 13.6255      NA      NA      NA
## [49]      NA 13.2490
cbind(AssignmentDataRegressionComparison[,1],Means.forPlot)[1:50,]
##             Means.forPlot
##  [1,] 13.52            NA
##  [2,] 13.58            NA
##  [3,] 14.50            NA
##  [4,] 14.76            NA
##  [5,] 15.20            NA
##  [6,] 15.22            NA
##  [7,] 15.24            NA
##  [8,] 15.08            NA
##  [9,] 15.25            NA
## [10,] 15.15       15.0405
## [11,] 15.79            NA
## [12,] 15.32            NA
## [13,] 15.71            NA
## [14,] 15.72            NA
## [15,] 15.70       15.1865
## [16,] 15.35            NA
## [17,] 15.20            NA
## [18,] 15.06            NA
## [19,] 14.87            NA
## [20,] 14.59       15.2480
## [21,] 14.90            NA
## [22,] 14.85            NA
## [23,] 14.67            NA
## [24,] 14.74            NA
## [25,] 15.32       14.9345
## [26,] 15.52            NA
## [27,] 15.46            NA
## [28,] 15.54            NA
## [29,] 15.51            NA
## [30,] 15.14       14.7545
## [31,] 15.02            NA
## [32,] 14.48            NA
## [33,] 14.09            NA
## [34,] 14.23            NA
## [35,] 14.15       14.6025
## [36,] 14.20            NA
## [37,] 14.14            NA
## [38,] 14.22            NA
## [39,] 14.52            NA
## [40,] 14.39       14.0820
## [41,] 14.49            NA
## [42,] 14.51            NA
## [43,] 14.29            NA
## [44,] 14.16            NA
## [45,] 13.99       13.6255
## [46,] 13.92            NA
## [47,] 13.66            NA
## [48,] 13.21            NA
## [49,] 13.02            NA
## [50,] 12.95       13.2490
plot(Means.forPlot,col="red")
lines(AssignmentDataRegressionComparison[,1])

Run rolling daily difference standard deviation of each variable

all.diff<- data.frame(diff(as.matrix(AssignmentDataRegressionComparison)))
head(all.diff,10)
##           USGG3M USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR
## 1/6/1981    0.06   0.07    0.14    0.03   -0.08    -0.04     0.00
## 1/7/1981    0.92   0.74    0.50    0.47    0.40     0.27     0.22
## 1/8/1981    0.26   0.10    0.17    0.17    0.07    -0.03     0.02
## 1/9/1981    0.44   0.30    0.44    0.33    0.20     0.22     0.22
## 1/12/1981   0.02  -0.07   -0.36   -0.34   -0.17    -0.12    -0.05
## 1/13/1981   0.02  -0.13    0.13    0.03   -0.03     0.08     0.00
## 1/14/1981  -0.16  -0.20   -0.35   -0.22   -0.07     0.00    -0.01
## 1/15/1981   0.17   0.19    0.30    0.27    0.16     0.09     0.18
## 1/16/1981  -0.10  -0.11   -0.17   -0.17   -0.11    -0.09    -0.12
## 1/19/1981   0.64   0.75    0.54    0.07    0.26     0.11     0.12
##               Output1
## 1/6/1981   0.07587222
## 1/7/1981   1.35591615
## 1/8/1981   0.30119608
## 1/9/1981   0.82353207
## 1/12/1981 -0.42985741
## 1/13/1981  0.03935812
## 1/14/1981 -0.40425521
## 1/15/1981  0.52159590
## 1/16/1981 -0.33130842
## 1/19/1981  0.96621424
rolling.sd = rollapply(all.diff,width=Window.width,by=Window.shift,by.column=TRUE, sd)
head(rolling.sd)
##         USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR  USGG30YR
## [1,] 0.3433258 0.3262462 0.2748258 0.2030161 0.1713192 0.1299585 0.1202147
## [2,] 0.2933383 0.2907504 0.2261811 0.1499219 0.1450082 0.1146895 0.1192201
## [3,] 0.2613180 0.2437530 0.2006201 0.1632596 0.1654110 0.1459308 0.1351909
## [4,] 0.2551754 0.2469663 0.1989446 0.1692794 0.1717219 0.1551052 0.1422183
## [5,] 0.2480551 0.2481595 0.2102004 0.1786057 0.1744767 0.1643960 0.1516540
## [6,] 0.1963884 0.2363672 0.2095082 0.1809180 0.1822917 0.1664956 0.1537351
##        Output1
## [1,] 0.5639875
## [2,] 0.4707427
## [3,] 0.4681168
## [4,] 0.4786189
## [5,] 0.4888569
## [6,] 0.4788897
rolling.dates<-rollapply(AssignmentDataRegressionComparison[-1,],width=Window.width,by=Window.shift,
                         by.column=FALSE,FUN=function(z) rownames(z))
head(rolling.dates)
##      [,1]        [,2]        [,3]        [,4]        [,5]       
## [1,] "1/6/1981"  "1/7/1981"  "1/8/1981"  "1/9/1981"  "1/12/1981"
## [2,] "1/13/1981" "1/14/1981" "1/15/1981" "1/16/1981" "1/19/1981"
## [3,] "1/20/1981" "1/21/1981" "1/22/1981" "1/23/1981" "1/26/1981"
## [4,] "1/27/1981" "1/28/1981" "1/29/1981" "1/30/1981" "2/2/1981" 
## [5,] "2/3/1981"  "2/4/1981"  "2/5/1981"  "2/6/1981"  "2/9/1981" 
## [6,] "2/10/1981" "2/11/1981" "2/13/1981" "2/17/1981" "2/18/1981"
##      [,6]        [,7]        [,8]        [,9]        [,10]      
## [1,] "1/13/1981" "1/14/1981" "1/15/1981" "1/16/1981" "1/19/1981"
## [2,] "1/20/1981" "1/21/1981" "1/22/1981" "1/23/1981" "1/26/1981"
## [3,] "1/27/1981" "1/28/1981" "1/29/1981" "1/30/1981" "2/2/1981" 
## [4,] "2/3/1981"  "2/4/1981"  "2/5/1981"  "2/6/1981"  "2/9/1981" 
## [5,] "2/10/1981" "2/11/1981" "2/13/1981" "2/17/1981" "2/18/1981"
## [6,] "2/19/1981" "2/20/1981" "2/23/1981" "2/24/1981" "2/25/1981"
##      [,11]       [,12]       [,13]       [,14]       [,15]      
## [1,] "1/20/1981" "1/21/1981" "1/22/1981" "1/23/1981" "1/26/1981"
## [2,] "1/27/1981" "1/28/1981" "1/29/1981" "1/30/1981" "2/2/1981" 
## [3,] "2/3/1981"  "2/4/1981"  "2/5/1981"  "2/6/1981"  "2/9/1981" 
## [4,] "2/10/1981" "2/11/1981" "2/13/1981" "2/17/1981" "2/18/1981"
## [5,] "2/19/1981" "2/20/1981" "2/23/1981" "2/24/1981" "2/25/1981"
## [6,] "2/26/1981" "2/27/1981" "3/2/1981"  "3/3/1981"  "3/4/1981" 
##      [,16]       [,17]       [,18]       [,19]       [,20]      
## [1,] "1/27/1981" "1/28/1981" "1/29/1981" "1/30/1981" "2/2/1981" 
## [2,] "2/3/1981"  "2/4/1981"  "2/5/1981"  "2/6/1981"  "2/9/1981" 
## [3,] "2/10/1981" "2/11/1981" "2/13/1981" "2/17/1981" "2/18/1981"
## [4,] "2/19/1981" "2/20/1981" "2/23/1981" "2/24/1981" "2/25/1981"
## [5,] "2/26/1981" "2/27/1981" "3/2/1981"  "3/3/1981"  "3/4/1981" 
## [6,] "3/5/1981"  "3/6/1981"  "3/9/1981"  "3/10/1981" "3/11/1981"
rownames(rolling.sd)<-rolling.dates[,10]
head(rolling.sd) # made a data frame of rolling sds of daily differences with rownames of middle date, 
##              USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR
## 1/19/1981 0.3433258 0.3262462 0.2748258 0.2030161 0.1713192 0.1299585
## 1/26/1981 0.2933383 0.2907504 0.2261811 0.1499219 0.1450082 0.1146895
## 2/2/1981  0.2613180 0.2437530 0.2006201 0.1632596 0.1654110 0.1459308
## 2/9/1981  0.2551754 0.2469663 0.1989446 0.1692794 0.1717219 0.1551052
## 2/18/1981 0.2480551 0.2481595 0.2102004 0.1786057 0.1744767 0.1643960
## 2/25/1981 0.1963884 0.2363672 0.2095082 0.1809180 0.1822917 0.1664956
##            USGG30YR   Output1
## 1/19/1981 0.1202147 0.5639875
## 1/26/1981 0.1192201 0.4707427
## 2/2/1981  0.1351909 0.4681168
## 2/9/1981  0.1422183 0.4786189
## 2/18/1981 0.1516540 0.4888569
## 2/25/1981 0.1537351 0.4788897
matplot(rolling.sd[,c(1,5,7,8)],xaxt="n",type="l",col=c("black","red","blue","green")) # 3M, 5Y and 30Y and output
axis(side=1,at=1:1656,rownames(rolling.sd))

Show periods of high volatility.

high.volatility.periods<-rownames(rolling.sd)[rolling.sd[,8]>.5]
high.volatility.periods
##  [1] "1/19/1981"  "3/25/1981"  "4/1/1981"   "4/8/1981"   "4/23/1981" 
##  [6] "4/30/1981"  "5/7/1981"   "5/14/1981"  "5/21/1981"  "5/29/1981" 
## [11] "6/5/1981"   "6/12/1981"  "6/19/1981"  "6/26/1981"  "7/6/1981"  
## [16] "7/13/1981"  "7/20/1981"  "7/27/1981"  "10/28/1981" "11/5/1981" 
## [21] "11/13/1981" "11/20/1981" "11/30/1981" "12/7/1981"  "12/14/1981"
## [26] "12/29/1981" "1/14/1982"  "1/21/1982"  "1/28/1982"  "2/4/1982"  
## [31] "2/11/1982"  "7/29/1982"  "8/5/1982"   "8/12/1982"  "8/19/1982" 
## [36] "8/26/1982"  "9/24/1982"  "10/1/1982"  "10/8/1982"  "10/18/1982"
## [41] "10/13/1987" "10/20/1987" "10/27/1987" "11/19/2007" "11/26/2007"
## [46] "11/12/2008" "11/19/2008"

How is volatility related to the level of rates?

As can be seen in the chart, high volatility is related the most to high fluctuation of rates of the USGG3M (black), then 6YR (red) and least with 30YR (blue)

Fit linear model to rolling window data using 3 months, 5 years and 30 years variables as predictors.

# Rolling lm coefficients
Coefficients<-rollapply(AssignmentDataRegressionComparison,width=Window.width,by=Window.shift,by.column=FALSE, FUN=function(z) coef(lm(Output1~USGG3M+USGG5YR+USGG30YR,data=as.data.frame(z))))

rolling.dates<-rollapply(AssignmentDataRegressionComparison[,1:8],width=Window.width,by=Window.shift,by.column=FALSE, FUN=function(z) rownames(z)) #included the first row. why

rownames(Coefficients)<-rolling.dates[,10]
Coefficients[1:10,]
##           (Intercept)    USGG3M  USGG5YR  USGG30YR
## 1/16/1981   -15.70877 0.6723609 1.797680 0.2276011
## 1/23/1981   -15.96684 0.6948992 1.480514 0.5529139
## 1/30/1981   -16.77273 0.7078197 1.434388 0.6507280
## 2/6/1981    -16.90734 0.7279033 1.470083 0.6003377
## 2/17/1981   -17.46896 0.7343406 1.361289 0.7499705
## 2/24/1981   -17.04722 0.7357663 1.295641 0.7844908
## 3/3/1981    -17.67901 0.8544681 1.396653 0.5945022
## 3/10/1981   -17.72402 0.9162385 1.654274 0.2571200
## 3/17/1981   -17.00260 0.9265767 1.647852 0.1951273
## 3/24/1981   -16.84302 0.9102780 1.477727 0.3788401

Look at pairwise X-Y plots of regression coefficients for the 3M, 5Yr and 30Yr yields as inputs.

# Pairs plot of Coefficients
pairs(Coefficients)

Interpret the pairs plot.

Plot the coefficients. Show periods.

# Plot of coefficients
matplot(Coefficients[,-1],xaxt="n",type="l",col=c("black","red","green"))
axis(side=1,at=1:1657,rownames(Coefficients))

high.slopespread.periods<-rownames(Coefficients)[Coefficients[,3]-Coefficients[,4]>3] # USGG5YR and USGG30YR coefficient difference larger than 3
jump.slopes<-rownames(Coefficients)[Coefficients[,3]>3] #USGG5YR coefficient larger than 3 itself
high.slopespread.periods
##  [1] "4/26/1982"  "12/15/1982" "9/16/1985"  "5/12/1987"  "5/19/1987" 
##  [6] "5/27/1987"  "9/25/1987"  "3/15/1988"  "9/27/1988"  "10/5/1988" 
## [11] "3/10/1989"  "2/5/1992"   "8/3/1994"   "12/8/1994"  "6/14/1996" 
## [16] "5/9/1997"   "5/16/1997"  "5/23/1997"  "5/30/1997"  "12/26/2000"
## [21] "1/2/2001"   "7/25/2001"  "8/1/2001"   "11/13/2003" "8/12/2004" 
## [26] "12/16/2004"
jump.slopes
## [1] "12/16/2004"

Is the picture of coefficients consistent with the picture of pairs? If yes, explain why.

Yes, green and red are almost opposite direction, as we see from pairs(), whereas black seems to have no relation with green and red.

How often the R-squared is not considered high?

# R-squared
r.squared<-rollapply(AssignmentDataRegressionComparison,width=Window.width,by=Window.shift,by.column=FALSE,
         FUN=function(z) summary(lm(Output1~USGG3M+USGG5YR+USGG30YR,data=as.data.frame(z)))$r.squared)
r.squared<-cbind(rolling.dates[,10],r.squared)
r.squared[1:10,]
##                   r.squared          
##  [1,] "1/16/1981" "0.995046300986446"
##  [2,] "1/23/1981" "0.992485868136646"
##  [3,] "1/30/1981" "0.998641209587999"
##  [4,] "2/6/1981"  "0.998849080081881"
##  [5,] "2/17/1981" "0.997958757207598"
##  [6,] "2/24/1981" "0.996489757136839"
##  [7,] "3/3/1981"  "0.99779753570421" 
##  [8,] "3/10/1981" "0.998963395226792"
##  [9,] "3/17/1981" "0.998729445388789"
## [10,] "3/24/1981" "0.997073000898673"
plot(r.squared[,2],xaxt="n",ylim=c(0,1))
axis(side=1,at=1:1657,rownames(Coefficients))

(low.r.squared.periods<-r.squared[r.squared[,2]<.9,1])
## [1] "6/24/1987" "6/27/1991" "4/28/2005" "6/20/2012"

What could cause decrease of \(R^2\)?

Analyze the rolling p-values.

# P-values
Pvalues<-rollapply(AssignmentDataRegressionComparison,width=Window.width,by=Window.shift,by.column=FALSE,
                        FUN=function(z) summary(lm(Output1~USGG3M+USGG5YR+USGG30YR,data=as.data.frame(z)))$coefficients[,4])
rownames(Pvalues)<-rolling.dates[,10]
Pvalues[1:10,]
##            (Intercept)       USGG3M      USGG5YR     USGG30YR
## 1/16/1981 1.193499e-10 3.764585e-10 2.391260e-07 2.538852e-01
## 1/23/1981 3.751077e-12 1.008053e-11 2.447369e-07 5.300949e-03
## 1/30/1981 3.106359e-18 1.406387e-14 4.040035e-09 3.626961e-05
## 2/6/1981  2.591522e-19 3.360104e-19 3.828054e-11 2.221691e-05
## 2/17/1981 1.897239e-16 6.578118e-17 1.461743e-09 1.331767e-04
## 2/24/1981 2.341158e-13 1.000212e-13 9.008221e-07 4.733543e-03
## 3/3/1981  5.435581e-14 1.535503e-11 3.357199e-06 6.010473e-02
## 3/10/1981 6.227624e-16 1.178498e-16 1.679479e-05 3.851840e-01
## 3/17/1981 9.592582e-17 7.065226e-20 1.459692e-05 5.025726e-01
## 3/24/1981 8.248747e-16 6.689840e-16 6.413371e-04 3.052705e-01
matplot(Pvalues,xaxt="n",col=c("black","blue","red","green"),type="o")
axis(side=1,at=1:1657,rownames(Coefficients))

rownames(Pvalues)[Pvalues[,2]>.5] #days that USGG3M is not a best predictor
##  [1] "7/15/1992"  "7/26/1996"  "8/2/1996"   "11/7/2000"  "5/30/2001" 
##  [6] "5/2/2002"   "5/16/2002"  "5/23/2002"  "1/30/2003"  "2/6/2003"  
## [11] "7/24/2003"  "7/31/2003"  "8/7/2003"   "11/20/2003" "12/18/2003"
## [16] "4/28/2005"  "2/10/2006"  "3/9/2007"   "3/16/2007"  "7/21/2009" 
## [21] "10/6/2009"  "10/13/2009" "12/28/2010" "1/11/2011"  "3/1/2011"  
## [26] "11/16/2011" "11/23/2011" "5/23/2012"  "7/11/2012"  "6/6/2013"  
## [31] "1/16/2014"  "1/30/2014"  "3/6/2014"
rownames(Pvalues)[Pvalues[,3]>.5] # days that USGG5YR is not a best predictor
## [1] "12/1/1982" "3/16/1987" "4/28/1987" "6/24/1987" "9/3/1987"  "9/11/1987"
## [7] "9/20/1988" "12/3/1999"
rownames(Pvalues)[Pvalues[,4]>.5] # days that USGG30YR is not a predictor
##   [1] "3/17/1981"  "4/22/1981"  "4/29/1981"  "6/4/1981"   "10/13/1981"
##   [6] "11/19/1981" "2/3/1982"   "2/26/1982"  "4/2/1982"   "4/12/1982" 
##  [11] "5/3/1982"   "7/7/1982"   "9/1/1982"   "9/23/1982"  "12/8/1982" 
##  [16] "2/18/1983"  "3/1/1983"   "5/4/1983"   "12/30/1983" "1/9/1984"  
##  [21] "2/6/1984"   "3/14/1984"  "3/21/1984"  "4/11/1984"  "6/22/1984" 
##  [26] "6/29/1984"  "10/31/1984" "11/16/1984" "11/26/1984" "12/17/1984"
##  [31] "3/25/1985"  "5/14/1985"  "5/21/1985"  "8/30/1985"  "9/9/1985"  
##  [36] "9/23/1985"  "10/1/1985"  "10/8/1985"  "10/16/1985" "10/23/1985"
##  [41] "10/30/1985" "11/14/1985" "11/21/1985" "1/22/1986"  "1/29/1986" 
##  [46] "5/5/1987"   "12/16/1987" "1/25/1988"  "2/1/1988"   "2/16/1988" 
##  [51] "3/1/1988"   "3/22/1988"  "5/18/1988"  "6/3/1988"   "6/10/1988" 
##  [56] "6/24/1988"  "7/25/1988"  "8/15/1988"  "12/5/1988"  "2/2/1989"  
##  [61] "3/3/1989"   "4/10/1989"  "5/1/1989"   "6/13/1989"  "8/16/1989" 
##  [66] "9/14/1989"  "9/21/1989"  "10/3/1989"  "10/11/1989" "10/18/1989"
##  [71] "11/1/1989"  "11/30/1989" "12/7/1989"  "1/8/1990"   "1/16/1990" 
##  [76] "6/15/1990"  "7/30/1990"  "8/6/1990"   "10/2/1990"  "10/10/1990"
##  [81] "11/23/1990" "3/1/1991"   "5/13/1991"  "5/20/1991"  "6/13/1991" 
##  [86] "7/5/1991"   "7/19/1991"  "9/16/1991"  "2/12/1992"  "2/20/1992" 
##  [91] "3/12/1992"  "4/16/1992"  "4/24/1992"  "5/1/1992"   "5/8/1992"  
##  [96] "6/2/1992"   "6/9/1992"   "6/16/1992"  "8/19/1992"  "8/26/1992" 
## [101] "10/23/1992" "5/20/1993"  "6/11/1993"  "6/18/1993"  "8/30/1993" 
## [106] "12/15/1993" "12/22/1993" "3/16/1994"  "3/30/1994"  "4/6/1994"  
## [111] "4/20/1994"  "4/27/1994"  "6/29/1994"  "8/17/1994"  "9/21/1994" 
## [116] "9/28/1994"  "12/22/1994" "12/29/1994" "1/5/1995"   "1/12/1995" 
## [121] "1/26/1995"  "2/2/1995"   "2/9/1995"   "4/6/1995"   "4/13/1995" 
## [126] "8/25/1995"  "9/29/1995"  "10/27/1995" "11/3/1995"  "11/10/1995"
## [131] "12/29/1995" "1/5/1996"   "1/12/1996"  "1/19/1996"  "3/29/1996" 
## [136] "5/31/1996"  "6/21/1996"  "7/12/1996"  "7/19/1996"  "7/26/1996" 
## [141] "8/2/1996"   "8/9/1996"   "8/16/1996"  "8/23/1996"  "9/27/1996" 
## [146] "10/4/1996"  "12/6/1996"  "2/28/1997"  "3/7/1997"   "4/18/1997" 
## [151] "4/25/1997"  "5/2/1997"   "6/13/1997"  "6/20/1997"  "6/27/1997" 
## [156] "7/4/1997"   "10/10/1997" "10/17/1997" "12/12/1997" "12/19/1997"
## [161] "12/26/1997" "1/9/1998"   "1/16/1998"  "8/14/1998"  "8/21/1998" 
## [166] "8/28/1998"  "9/18/1998"  "9/25/1998"  "12/4/1998"  "12/11/1998"
## [171] "1/8/1999"   "3/12/1999"  "4/2/1999"   "5/14/1999"  "6/25/1999" 
## [176] "7/9/1999"   "7/30/1999"  "8/20/1999"  "9/10/1999"  "9/24/1999" 
## [181] "10/15/1999" "12/31/1999" "3/3/2000"   "3/31/2000"  "4/7/2000"  
## [186] "4/14/2000"  "4/21/2000"  "7/25/2000"  "11/21/2000" "11/28/2000"
## [191] "3/7/2001"   "5/30/2001"  "7/11/2001"  "10/11/2001" "12/13/2001"
## [196] "1/24/2002"  "1/31/2002"  "8/29/2002"  "9/26/2002"  "12/26/2002"
## [201] "1/16/2003"  "1/23/2003"  "3/6/2003"   "3/13/2003"  "3/20/2003" 
## [206] "3/27/2003"  "5/1/2003"   "6/19/2003"  "6/26/2003"  "7/3/2003"  
## [211] "9/18/2003"  "9/25/2003"  "10/16/2003" "10/23/2003" "10/30/2003"
## [216] "11/20/2003" "1/1/2004"   "2/5/2004"   "2/26/2004"  "3/4/2004"  
## [221] "4/15/2004"  "4/22/2004"  "5/13/2004"  "5/27/2004"  "6/3/2004"  
## [226] "6/17/2004"  "6/24/2004"  "7/8/2004"   "10/14/2004" "10/21/2004"
## [231] "10/28/2004" "11/18/2004" "12/23/2004" "3/17/2005"  "3/24/2005" 
## [236] "3/31/2005"  "4/7/2005"   "7/21/2005"  "8/11/2005"  "9/22/2005" 
## [241] "10/6/2005"  "11/3/2005"  "12/8/2005"  "12/15/2005" "1/20/2006" 
## [246] "5/12/2006"  "5/19/2006"  "5/26/2006"  "6/2/2006"   "6/9/2006"  
## [251] "6/16/2006"  "7/7/2006"   "7/21/2006"  "10/6/2006"  "11/3/2006" 
## [256] "1/12/2007"  "2/23/2007"  "3/16/2007"  "4/27/2007"  "5/25/2007" 
## [261] "9/7/2007"   "9/14/2007"  "9/21/2007"  "9/28/2007"  "11/16/2007"
## [266] "5/5/2009"   "6/1/2010"   "6/15/2010"  "1/25/2011"  "2/1/2011"  
## [271] "6/12/2014"

Interpret the plot.

Step 7.

Perform PCA with the inputs (columns 1-7).

AssignmentData.Output<-AssignmentData$Output1
AssignmentData<-data.matrix(AssignmentData[,1:7],rownames.force="automatic")
dim(AssignmentData)
## [1] 8300    7
head(AssignmentData)
##           USGG3M USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR
## 1/5/1981   13.52  13.09  12.289   12.28  12.294   12.152   11.672
## 1/6/1981   13.58  13.16  12.429   12.31  12.214   12.112   11.672
## 1/7/1981   14.50  13.90  12.929   12.78  12.614   12.382   11.892
## 1/8/1981   14.76  14.00  13.099   12.95  12.684   12.352   11.912
## 1/9/1981   15.20  14.30  13.539   13.28  12.884   12.572   12.132
## 1/12/1981  15.22  14.23  13.179   12.94  12.714   12.452   12.082
AD.Predictors.PCA<-princomp(AssignmentData)

Plot factors and their explained relative variance

plot(AD.Predictors.PCA)

barplot(AD.Predictors.PCA$sdev^2/sum(AD.Predictors.PCA$sdev^2),ylim=c(0,1))

The numeric numbers are:

cumsum(AD.Predictors.PCA$sdev^2/sum(AD.Predictors.PCA$sdev^2))
##    Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6    Comp.7 
## 0.9783429 0.9981063 0.9996652 0.9998455 0.9999515 0.9999802 1.0000000

Comp.1 is very significant in explain all the variation among predictors.

Visualize the loadings

(AD.PCALoadings<-AD.Predictors.PCA$loadings)
## 
## Loadings:
##          Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## USGG3M   -0.384  0.507  0.530 -0.404  0.386              
## USGG6M   -0.390  0.439  0.111  0.405 -0.679              
## USGG2YR  -0.415  0.111 -0.419  0.409  0.379 -0.298  0.490
## USGG3YR  -0.406        -0.448         0.236  0.198 -0.732
## USGG5YR  -0.386 -0.231 -0.246 -0.534 -0.287  0.421  0.439
## USGG10YR -0.348 -0.432  0.150 -0.199 -0.256 -0.736 -0.153
## USGG30YR -0.305 -0.544  0.498  0.421  0.207  0.378       
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.143  0.143  0.143  0.143  0.143  0.143  0.143
## Cumulative Var  0.143  0.286  0.429  0.571  0.714  0.857  1.000
matplot(1:7,AD.Predictors.PCA$loadings,type="l",lty=1:7,lwd=1,xaxt="n",xlab="Predictor",ylab="Factor Loadings",ylim=c(-0.8,0.8),col=c("black","red","blue","green","cyan","orange","magenta"))
abline(h=0)
axis(1, 1:7,labels=colnames(AssignmentData))
legend("topright",legend=c("L1","L2","L3","L4","L5","L6","L7"),lty=1:7,lwd=1,cex=.7,col=c("black","red","blue","green","cyan","orange","magenta"))

Comp 1 has negative loadings from all predictors, it means that it captures all the increase or decrease of all predictors together, component 2 captures the opposite trend of short-term bonds and the long-term bonds, whereas component 3 captures the opposite trend between mid-term bonds and short/long term bonds

Visualize Factors

AD.PCAFactors<-AD.Predictors.PCA$scores
#Plot
matplot(AD.Predictors.PCA$scores,type="l",lty=1:7,lwd=1,ylim=c(-30,20),col=c("black","red","blue","green","cyan","orange","magenta"))
legend("topleft",ncol=2, legend=c("F1","F2","F3","F4","F5","F6","F7"),lty=1:7,lwd=1,col=c("black","red","blue","green","cyan","orange","magenta"))

Visualize correlations

AD.Rotated<-as.data.frame(cbind(Output1=AssignmentData.Output,AD.PCAFactors))
pairs(as.matrix(AD.Rotated))

Output 1 really is strongly negatively correlated to component 1 and not too much with any other components, maybe slightly positively correlated to component 5

Also component 2, component 3 and 5 are positively correlated, and they are all negatively correlated to component 4

(rSqrCorrelations<-apply(AD.Predictors.PCA$scores,2,cor,AssignmentData.Output)^2)
##       Comp.1       Comp.2       Comp.3       Comp.4       Comp.5 
## 1.000000e+00 1.557347e-24 3.341241e-24 1.293688e-26 5.060602e-24 
##       Comp.6       Comp.7 
## 1.280434e-24 1.941283e-24

Calculate relative importance measures for the PCA factors.

AD.lm.PCA.full<-lm(Output1~.,  data=AD.Rotated)
metrics.AD.pca <- calc.relimp(AD.lm.PCA.full, type = c("lmg", "first", "last","betasq", "pratt"))
## Warning in rev(variances[[p]]) - variances[[p + 1]]: Recycling array of length 1 in vector-array arithmetic is deprecated.
##   Use c() or as.vector() instead.
(metrics.AD.pca@lmg.rank)
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 
##    1.0    4.5    4.5    4.5    4.5    4.5    4.5
subsets.AD.Rotated<-regsubsets(x=AD.PCAFactors,y=AD.Rotated$Output1)
summary(subsets.AD.Rotated)$which
##   (Intercept) Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## 1        TRUE   TRUE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE
## 2        TRUE   TRUE  FALSE  FALSE  FALSE   TRUE  FALSE  FALSE
## 3        TRUE   TRUE  FALSE   TRUE  FALSE   TRUE  FALSE  FALSE
## 4        TRUE   TRUE  FALSE   TRUE  FALSE   TRUE  FALSE   TRUE
## 5        TRUE   TRUE   TRUE   TRUE  FALSE   TRUE  FALSE   TRUE
## 6        TRUE   TRUE   TRUE   TRUE  FALSE   TRUE   TRUE   TRUE
## 7        TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE

Indeed, like we suspected from the correlation charts, component 1 is a must for predicting Output1 and the next useful predictor is component 5

AD.lm.PCA.1<-lm(Output1 ~ Comp.1, data=AD.Rotated)
summary(AD.lm.PCA.1)
## 
## Call:
## lm(formula = Output1 ~ Comp.1, data = AD.Rotated)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -5.011e-09 -3.478e-10 -6.800e-12  3.296e-10  5.029e-09 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)  1.421e-11  1.715e-11  8.290e-01    0.407    
## Comp.1      -1.000e+00  1.957e-12 -5.111e+11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.562e-09 on 8298 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 2.612e+23 on 1 and 8298 DF,  p-value: < 2.2e-16
AD.lm.PCA.15<-lm(Output1 ~ Comp.1 + Comp.5, data=AD.Rotated)
summary(AD.lm.PCA.15)
## 
## Call:
## lm(formula = Output1 ~ Comp.1 + Comp.5, data = AD.Rotated)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -5.033e-09 -3.468e-10 -5.000e-12  3.337e-10  5.036e-09 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)  1.421e-11  1.715e-11  8.290e-01    0.407    
## Comp.1      -1.000e+00  1.957e-12 -5.111e+11   <2e-16 ***
## Comp.5       2.196e-10  1.880e-10  1.168e+00    0.243    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.562e-09 on 8297 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.306e+23 on 2 and 8297 DF,  p-value: < 2.2e-16

After trying out with or without component 5, we can see we only need component 1 as the predictor, reaching Adjusted R-squared equals to 1. This is inevitably overfitting, but it’s a simplification from selecting 1 or 2 or 3 predictors like we did in Step 5.